VAD-measure-embedded decoder with online model adaptation

نویسندگان

Tasuku Oonishi

Koji Iwano

Sadaoki Furui

چکیده

We previously proposed a decoding method for automatic speech recognition utilizing hypothesis scores weighted by voice activity detection (VAD)-measures. This method uses two Gaussian mixture models (GMMs) to obtain confidence measures: one for speech, the other for non-speech. To achieve good search performance, we need to adapt the GMMs properly for input utterances and environmental noise. We describe a new unsupervised on-line GMM adaptation method based on MAP estimation. The robustness of our method is further improved by weighting updating parameters of GMMs according to the confidence measure for the adaptation data. We also describe an approach to accelerate the adaptation by caching statistical values to adapt GMMs. Experimental results on Drivers’ Japanese Speech Corpus in a Car Environment (DJSC) show that the adaptation with decoding method significantly improves the word accuracy from 54.8% to 59.6%. Moreover, the weighting method improves the robustness of the unsupervised adaptation, and the cache method greatly accelerates the decoding process. Consequently, our adaptive decoding method significantly improves the word accuracy in a noisy environment with only a minor increase in the computational cost.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust speech recognition using VAD-measure-embedded decoder

In a speech recognition system a Voice Activity Detector (VAD) is a crucial component for not only maintaining accuracy but also for reducing computational consumption. Front-end approaches which drop non-speech frames typically attempt to detect speech frames by utilizing speech/non-speech classification information such as the zero crossing rate or statistical models. These approaches discard...

متن کامل

Mixed decision-based noise adaptation for speech enhancement

Introduction: Frequency domain speech enhancement is focused mainly on improved estimation of spectral attenuation factors with the assumption of given noise statistics. However, in practice, the noise statistics exhibit fluctuations from frame to frame. Thus, a method for robust estimation of the noise statistics is investigated in this Letter. Conventional noise estimation can be classified i...

متن کامل

Proficient BMI Control Enabled by Closed-Loop Adaptation of an Optimal Feedback-Controlled Point Process Decoder

Much progress has been made in brain-machine interface (BMI) development using closed-loop decoder adaptation (CLDA) methods. CLDA fits the decoder parameters during closed-loop BMI operation based on the neural activity and inferred user velocity intention. This progress has resulted in the recent high-performance ReFIT Kalman filter (ReFIT KF) [1]. Here we develop an adaptive optimal feedback...

متن کامل

Voice Activity Detection Using Speech Recognizer Feedback

This paper demonstrates how feedback from a speech recognizer can be leveraged to improve Voice Activity Detection (VAD) for online speech recognition. First, reliably transcribed segments of audio are fed back by the recognizer as supervision for VAD model adaptation. This allows the much stronger LVCSR acoustic models to be harnessed without adding computation. Second, when to make a VAD deci...

متن کامل

Voice Activity Detection Based on Discriminative Weight Training Incorporating a Spectral Flatness Measure

In this paper, we present an approach to incorporate discriminative weight training into a statistical model-based voice activity detection (VAD) method. In our approach, the VAD decision rule is derived from the optimally weighted likelihood ratios (LRs) using a minimum classification error (MCE) method. An adaptive online means of selecting two kinds of weights based on a power spectral flatn...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

VAD-measure-embedded decoder with online model adaptation

نویسندگان

چکیده

منابع مشابه

Robust speech recognition using VAD-measure-embedded decoder

Mixed decision-based noise adaptation for speech enhancement

Proficient BMI Control Enabled by Closed-Loop Adaptation of an Optimal Feedback-Controlled Point Process Decoder

Voice Activity Detection Using Speech Recognizer Feedback

Voice Activity Detection Based on Discriminative Weight Training Incorporating a Spectral Flatness Measure

عنوان ژورنال:

اشتراک گذاری